Identifying the true dimensionality of a dataset and the most significant PC can be challenging/uncertain. Elbowplot method generates a ranking of principle components based on the percentage of variance explained by each one. In this example, we can observe an elbow (i.e. beginning of the straight line) somewhere between PC 20-25, suggesting that the majority of true signal is captured in the first 21 PCs.
##
## 0 1 2 3 4 5 6 7 8 9
## 7283 4410 3335 3044 2962 2231 1602 380 281 237
## expt.type 0 1 2 3 4 5 6 7 8 9
## 1: Naive 2738 1410 866 771 1251 714 66 147 146 48
## 2: Inflamed 4545 3000 2469 2273 1711 1517 1536 233 135 189
## [1] "Naive-cell count"
## [1] 8157
## [1] "Inflamed-cell count"
## [1] 17608
## # A tibble: 20 × 4
## # Groups: Condition [2]
## Cluster_ID Condition Count Percent
## <fct> <fct> <dbl> <dbl>
## 1 0 Naive 2738 33.6
## 2 1 Naive 1410 17.3
## 3 2 Naive 866 10.6
## 4 3 Naive 771 9.45
## 5 4 Naive 1251 15.3
## 6 5 Naive 714 8.75
## 7 6 Naive 66 0.809
## 8 7 Naive 147 1.80
## 9 8 Naive 146 1.79
## 10 9 Naive 48 0.588
## 11 0 Inflamed 4545 25.8
## 12 1 Inflamed 3000 17.0
## 13 2 Inflamed 2469 14.0
## 14 3 Inflamed 2273 12.9
## 15 4 Inflamed 1711 9.72
## 16 5 Inflamed 1517 8.62
## 17 6 Inflamed 1536 8.72
## 18 7 Inflamed 233 1.32
## 19 8 Inflamed 135 0.767
## 20 9 Inflamed 189 1.07
Here, Inflamed is the treatment group and Naive is taken as the control group.Comaprison starts with C0 cells in Inflamed with C0-C9 cells in Naive and ends with C9 cells in Inflamed with C0-C9 cells in Naive. Top and bottom 10 values are selected based on the absolute values of p_val_adj and avg_log2FC.
Clustering is a core tool for analysing single-cell RNA-sequencing (scRNA-seq) datasets. The clustering is primarily controlled by two parameters, number of principle components and then resolution. A clustering tree visualises the relationships between at a range of resolutions.